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Abstract — This paper presents a new analysis for the orthog- 
onal matching pursuit (OMP) algorithm. It is shown that if the 
restricted isometry property (RIP) is satisfied at sparsity level 
0(k), then OMP can stably recover a fc-sparse signal in 2-norm 
under measurement noise. For compressed sensing applications, 
this result implies that in order to uniformly recover a A: -sparse 
signal in E d , only 0(k In d) random projections are needed. This 
analysis improves some earlier results on OMP depending on 
stronger conditions that can only be satisfied with fi(fc 2 lnd) or 
Q(k 16 lnd) random projections. 

Index Terms — Estimation theory, feature selection, greedy 
algorithms, statistical learning, sparse recovery 



I, Introduction 

Consider a signal x € R d , and suppose that we observe its 
linear transformation plus measurement noise as: 

y = Ax + noise. 

Here, A is an n x d matrix. If we define an objective function 

Q(x) = ||AK-y|H, (1) 



then we may estimate the parameter x by minimizing Q(x), 
subject to appropriate constraints. 

If d > n, then the solution of the unconstrained optimization 
problem 

min Q(x) (2) 

xGR d 

is not unique. In order to estimate x, additional assumptions 
on x is necessary. We are specifically interested in the case 
where x is sparse. That is |jx|j <§C n, where 

IMIo = |supp(x)|, supp(x) = {j : Xj ^ 0}. 

It is known that under appropriate conditions, it is possible to 
recover x by solving (O with a sparsity constraint as follows: 



QW 



subject to ||x||o < k. 



(3) 



However, this optimization problem is generally NP-hard. 
Therefore one seeks computationally efficient algorithms that 
can approximately solve ©, with the goal of recovering sparse 
signal x. This paper considers the popular orthogonal matching 
pursuit algorithm (OMP), which has been widely used for this 
purpose (for example, see [5), lfl4l . |Q3)). We are specifically 
interested in two issues: the performance of OMP in terms 
of optimizing Q(x) and the performance of OMP in terms of 
recovering the sparse signal x. 
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II. Main Result 

Our analysis considers a more general objective function 
Q(x) that does not necessarily take the quadratic form in 
(Q3. However, we assume that Q(x) is convex. For such a 
general convex objective function, we consider the fully (or 
totally) corrective greedy algorithm in Figure [1] which was 
analyzed in |13|. This paper refines the analysis to show that 
the algorithm works under the restricted isometry property 
(RIP) of [3 1 (the required condition will be described later in 
this section). This algorithm is a direct generalization of OMP 
which has been traditionally considered only for the quadratic 
objective function in (JTJ with F^ = 0. For simplicity, we 
assume that the number of iterations fco is chosen a priori. The 
algorithm has been known in the machine learning community 
as a version of boosting lfl6l . and has also been proposed 
recently in the signal processing community [2|. 
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Fig. 1. Fully Corrective Greedy Boosting Algorithm (OMP) 

For quadratic loss, the objective function Q(x) is given 
by © and its derivative is VQ(x) = 2A T {Ax. - y). 
Therefore j — argmaxi |VQ(x^ fc_1 ))i| becomes j = 
argmaxi |a^~(Ax — y)|, where a^ is the i-th column of matrix 
A. This, together with F (0) = 0, leads to the standard OMP 
algorithm. In order to use notation consistent with the sparse 
recovery literature, in the current paper, we still refer to the 
more general algorithm in Figure Q] as OMP even though it 
applies to objective functions other than (fl). 

The general problem of optimization under sparsity con- 
straint is NP hard. In order to alleviate the difficulty, we 
consider approximate optimization under the restricted strong 
convexity assumption introduced below. 

Definition 2.1 (Restricted Strong Convexity Constants): 
Given any s > 0, define restricted strong convexity constants 



P-(s) and p+(s) as follows: for all ||x — x'||o < s, we require 
p_(*)||x - x'Hl <Q(x') - g(x) - VQ(x) T (x' - x) 

< P+ ( S )||X-X'||2. 

If the objective function takes the quadratic form given by 
(HJ, then the above definition is equivalent to the following 
sparse eigenvalue condition of A T A: VAx G R d such that 
IIAxllo < a, 



M.s)||Ax||2<||AAx||2< P+ ( s )||Ax|; 



(4) 



In this case, the constants p~(s) and p+(s) are closely related 
to the restricted isometry constant S s in 0, which is defined 
as a constant that satisfies the condition that VAx £ M. d such 
that ||Ax|| < s: 

(1 - S s )\\Ax\\% < \\AAx\\t < (1 + S s )\\Ax\\l 

The restricted isometry constant was used to define the re- 
stricted isometry property (RIP) in the analysis of L\ regu- 
larization method [3|. We employ the slightly more general 
restricted strong convexity constants in (@]i because our anal- 
ysis only requires the ratio p + (s)/p-(s) to be bounded, and 
this is useful for general machine learning problems where 
p+(s) can be larger than 2. 

In order to recover the target x, we have to assume that 
x is sparse and approximately optimizes Q(x). If a target 
x is an exact global optimal solution, then VQ(x) = 0. 
However, this paper deals with approximate optimal solutions, 
where VQ(x) s=s 0. In particular, we introduce the following 
definition, which is convenient to apply. 

Definition 2.2 (Restricted Gradient Optimal Constant): 
Given x £ M. d and s > 0, we define the restricted gradient 
optimal constant e s (x) as the smallest non-negative value that 
satisfies the following condition 

|VQ(x) T u|<e s (x)||u|| 2 

for all u £ K d such that ||u|| < s. 

The constant e s (x) measures how close is VQ(x) to zero. 
If VQ(x) = 0, then e s (x) = 0. If VQ(x) « 0, then e s (x) is 
small. Moreover, similar to the definition of restricted strong 
convex constants, we are only interested in the value of VQ(x) 
in any subset of {1, . . . , d} with s elements. The following 
proposition provides some estimates of e s (x) using quantities 
that are easier to understand. 

Proposition 2.1: We have e s (x) < •y/s||VQ(x)|| 00 and 
e s (x) < ||VQ(x)|| 2 . Moreover, if 

Q(x) < inf Q(x) + e, 

ll x llo<l|x||o+s 

then 



,(») < 2y/ P+ (s)i 



Proof: The first two inequalities are straight-forward. For 
the third inequality, we note that for ||u||o < s: 

inf Q(x) 

|x||o<||x|| + s 

<inf Q(x + ?7u) 
n 

<inf[Q(x)+7 ? VQ(x) T u + p + ( S )r y 2 ||u||2] 
v 

=Q(x)-|VQ(x) T u| 2 /(4p+( S )||u||2). 

The result follows by rearranging the above inequality. ■ 
The following theorem is the main result of this paper, 
which shows that OMP can approximately recover a sparse 
signal x in 2-norm if the condition <(5j in the theorem involving 
strong convexity constants can be satisfied. As we shall discuss 
later, this condition is closely related to the RIP condition for 
the quadratic objective (fl}. 

Theorem 2.1: Consider the OMP algorithm. Let x £ R d 
and F = supp(x). If there exists s such that 

s>|FUF(°)| 

+ M F\F^E±^ y Q ^ F \ F ^\ (5) 
p-{s) p-{s) 

then when k = fco = s — \F U F^\, we have 

Q(xW)<Q(x)+2.5e s (x) 2 /p-(s) 

and 

x|| 2 < v / 6e s (x)/ / 9_(s). 



r (fc) 



Proof: The detailed proof relies on a number of technical 
lemmas that are left to the appendix. 

The first inequality of the theorem is a direct consequence 
of Lemma IA31 The second inequality is a consequence of the 
first inequality and Lemma |A. 21 



P _(.s)||xW-x|| 



<2 



Q(x«)-Q(x) +e s (x) 2 / /3 _( S ) 



<6e s (x) 2 /p_( S ). 

This implies the second inequality. ■ 

Note that (O can be satisfied as long as 
(p+(l)/p-(s))hx(p+(k)/p-(s)) grows sub-linearly as a 
function of s. With appropriate assumptions, this allows 
the ratio p + (s)/p-(s) to be significantly larger than 1 but 
bounded from above (such a condition is sometimes referred 
to as sparse eigenvalue condition in the statistics literature). 
In this context, Theorem [5T| is useful for estimation problems 
encountered in machine learning, where p + (s)/p^(s) may 
be large. 

In compressed sensing, one can often control the ratio of 
p+(s)/p-(s) to be not much larger than 1 using random 
projection. In this context, the following result gives a simpler 
interpretation of the above theorem, where the condition (0 
of the theorem is replaced by p+(k) < 2/9_(31fc). 

Corollary 2.1: Consider the OMP algorithm with i^ (0) = 0. 
Letx £ R d and k = ||x|| . If the condition p+(k) < 2p_(31fc) 
holds, then when k — fco = 30fc, we have 

Q(x (fe) )<Q(x)+2.5e s (x) 2 /p_(.s) 



and 



||x( fc )-x|| 2 <V6e fl (x)/p_( S ), 



where s = 31k. 

Proof: If p+(k) < 2p_(31fc) holds, then we can let s 
31k, which implies that 



2>p + (k)/ P -{a)> P+ {l)/ P -(a). 



Therefore 



s=30fc> fc + 4fc • 2 ln(20 • 2) 
>fc + 4fc(p+(l)/p_(s)) ln(20p + (fc)/p_(s)). 

This means that the condition (0 holds, and the corollary 
follows directly from Theorem 12.11 ■ 

For the quadratic objective ([T), the condition p+(k) < 
2/9_(31fc) is analogous to the RIP condition in J3). In par- 
ticular, if the matrix A has the restricted isometry constant 
^3ifc < 1/3, then the condition p+(k) < 4/3 and p_(31fc) > 
2/3 holds, with p+(s) and p-(s) defined according to (@J, In 
this case, Corollary 12.11 can be directly applied. 

It is interesting to observe that except for constants, the 
result of this paper for OMP is as strong as those for 
more sophisticated greedy algorithms such as ROMP ifTTI or 
CoSaMP [ 10]. For example, Corollary |2.1| can be applied when 
S s < 1/3 with s = 31k, while a similar result for CoSaMP 
in iflOl applies when S s < 0.1 with s = 4fc. Nevertheless, 
the difference in the constants may still suggest possible 
advantages for more complex algorithms such as CoSaMP 
under suitable conditions. 

For quadratic objective function, a simple instantiation of 
e s (x) using Proposition 12.11 leads to the following sparse 
recovery result that is relatively simple to interpret. 

Corollary 2.2: Consider the quadratic objective function 
Q(x) = || Ax - y || I of O, and the OMP algorithm with 



F(°) = 0. Consider an arbitrary vector x e M. d and let 
fc = ||x|| . If the RIP condition p+(k) < 2p_(31fc) holds, 
then when k = fco = 30fc, we have 



||x« - x|| 2 < 2V6 P+ ( s ) 1 /2||Ax - y\U/p-(s), 



where s = 31k. 



III. Discussion 



In this paper we proved a new result for a generalization of 
the OMP algorithm. It is shown that if the RIP is satisfied at 
sparsity level Oik), then OMP can recover a fc-sparse signal 
in 2-norm. For compressed sensing applications, this result 
implies that in order to uniformly recover a fc-sparse signal in 
M. d , only n = 0{k\nd) random projections are needed [3|. 

Our result for signal recovery is stronger than previous 
results for OMP that relied on different conditions. For ex- 
ample, 1 14 1 considered the problem of recovering the support 
set of a sparse signal under a stronger condition (also see 
[18 1 for recovery properties under stochastic noise). A similar 
analysis was employed in [ 15 1, where it was shown that for any 
fixed sparse signal x with k = ||x|| , OMP can recover the 
signal with large probability using 0(k\nd) measurements. 
A more refined analysis in [6| shows that a lower bound 



of n = 2k \n(d — fc) measurements is enough for recovery. 
However, the above results are not uniform with respect to 
all fc-sparse signals x (that is, for any set of random projec- 
tions, there exist fc-sparsity signals that fail the analysis). In 
comparison, the RIP condition holds uniformly by definition, 
and hence our result applies uniformly to all fc-sparse signals. 
Although our result is stronger than previous results in terms of 
signal recovery in 2-norm, the result requires running the OMP 
algorithm for more than fc iterations, and hence doesn't recover 
the true support set of the ideal signal. In comparison, results 
such as lfl"5l also imply exact recovery of the correct support 
set (but under stronger assumptions) using only fc OMP itera- 
tions. It is also known that it is impossible to uniformly recover 
the support set (in fc iterations) with the OMP algorithm with 
0(fcln<i) measurements |12|. This means that it is necessary 
to run OMP for more than fc iterations in order to achieve the 
best 2-norm recovery performance with as few meausrements 
as possible. 

It is worth mentioning that some previous results apply 
uniformly to all fc-sparse signals. For example, results in 
depend on the stronger mutual incoherence condition. 
Unfortunately the mutual incoherence condition can only be 
satisfied with il(k 2 \nd) random projections. Therefore in 
recent years there have been significant interests in studying 
OMP under the RIP. In addition to the current paper, a number 
of recent papers investigated this issue, reaching varying 
conclusions |fl~), |0J, (8), (9). For example, the RIP-based 
analysis for sparse signals (but without noise) was considered 
in |01, El, with the conclusion that under a sufficiently strong 
assumption on the RIP constant (in fact, the resulting condition 
is similar to the mutual incoherence condition), exact recovery 
is possible in fc iterations. The condition required for the RIP 
constant was weakened in |9|, where the author showed that 
by running the OMP algorithm more than fc iterations, it is 
possible to achieve exact recovery (again assuming no noise). 
The condition in [9| can be satisfied with only 0(k l ' & h\d) 
measurements, which is a significant improvement over the 
traditional 51(fc 2 lnd) measurements. The result obtained in 
the current paper is along the same line as [9|, but reduced 
the required number of measurements to the optimal order of 
O(fclnd). 

It is also interesting to compare the new OMP result in 
this paper to that of Lasso, which is also known to work 
under the RIP. However, a more refined comparison illustrates 
differences between the known theoretical results for these two 
methods. For OMP, the result in Theorem 12. II can be applied 
as long as the condition 

S /\FUF^\ > 
4\F\F^\(p + (l)/p_(s))ln(20p + (\F\F(%/p_( S )) 

is satisfied. With F^ = 0, this roughly requires 
(p+(l)/p-(s)) ln(p + (fc)/p„(s)) to grow sub-linearly as a 
function of s in order to apply the theory. In comparison, the 
known condition for Lasso (e.g., this has been made explicit 
in |17|, [19|) requires p + (s)/p-(s) to grow sub-linearly as 
a function of s. To compare the two conditions, we note 
that the condition for OMP is weaker in terms of of the 



upper convexity constant as there is no explicit dependency on 
p+(s); however, the dependency on p~(s) is stronger in OMP 
than Lasso due to the logarithmic term. Although it is unclear 
how tight these conditions are, the comparison nevertheless 
indicates that even though both algorithms work under the RIP, 
there are still finer differences in their theoretical analysis: 
Lasso is slightly more favorable in terms of its dependency 
on the lower strong convexity constant, while OMP is more 
favorable in terms of its dependency on the upper strong 
convexity constant. We further conjecture that the extra loga- 
rithmic dependency \n(p+(k)/p-(s)) in OMP is necessary. 
In practice, some times Lasso performs better while other 
times OMP performs better (for example, see experimental 
results in [7|). Therefore some discrepancy in their theoretical 
analysis is expected. More specifically, for sparse recovery, 
one often observes that Lasso is superior when the nonzero 
coefficients have a similar magnitude (which happens to be the 
case that the extra ln(p+(fe)/p_(s)) factor is required in our 
OMP analysis) while OMP performs better when the nonzero 
coefficients exhibit rapid decay (which happens to be the case 
that the extra ln(p + (fc)/p_(s)) factor can be removed from 
our analysis). The theory in this paper significantly narrows 
the previous theoretical gap between these two sparse recovery 
methods by positively answering the open question of whether 
OMP can recover sparse signals under the RIP. Therefore our 
result allows practitioners to apply OMP with more confidence 
than previously expected. 
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Appendix 

We need a number of technical lemmas. Lemma IA.3I and 
Lemma lA4l key to the proof, are based on earlier work of the 
author with collaborators lfl"3l . Q. The first three lemmas use 
the following notations. Let F, F be two subsets of {1, . . . , d}. 
Let supp(x) C F, and 

x = arg min Q(z). 

z:supp(z)C-F 

Lemma A.l: We have 

Q(x) - Q(x) < 1.5p+(s)\\yL FyF f 2 + 0.5e s (x) 2 /p+(s) 

for all s> \F\F\. 

Proof: Let x' = x FnF , then by the definition of x, we 
know that Q(x) < Q(x'). Therefore 

Q(x) - Q(x) 

<Q(x') - Q(x) 

=Q(x') - Q(x) - VQ(x) T (x' - x) + VQ(x) T (x' - x) 

<P+( s )\\*f\f\\2 + e s (x)||x^ F || 2 

<P+(s)\\*f\f\\1 + 0.5e s (x) 2 /p + (s) + 0.5p+(s)||x PVF |||, 

which implies the lemma. The first inequality is by the 
definitions of p+(s) and e s (x). The last inequality follows 



from the fact that ab < 0.5a 2 +0.56 2 with a = e s (x)/y / p + (s) 
and b= y/ p + (s)\\5c F \ F \\ 2 . ■ 

Lemma A.l: We have: 

p-(.s)||x - x|| 2 < 2 [Q(x) - Q(x)] + e s (x) 2 /p_(,s) 

for all s> \FUF\. 
Proof: From 

Q(x) - Q(x) 
=Q(x) - Q(x) - VQ(x) T (x - x) + VQ(x) T (x - x) 
>p_(s)||x-x||^-e s (x)||x-x|| 2 

>0.5p_( S )||x~x|| 2 -0.5e s (x) 2 /p_(,s), 

we obtain the desired inequality. The first inequality is by 
the definitions of P-(s) and e s (x). The last inequality again 
follows from the fact that ab < 0.5a 2 + 0.5b 2 with a = 
e s (x)/vW(s) and b = y/p + (s)\\xLp\ F \\ 2 . ■ 

The next lemma shows that each greedy search makes 
reasonable progress. This proof is essentially identical to a 
similar result in [13] but with refined notations used in the 
current paper. We thus include the proof for completeness. It 
allows the readers to verify more easily that the proof in lfT3l 
remains unchanged with our new definitions. 

Lemma A.3: Let e^ G M. d be the vector of zeros except for 
the i-th component being one. If F \ F ^ 0, then for all 
s> \FUF\: 



minQ(x + ae,-) 

a 

p_(s)||x-x|| 2 



<Q(x 



-(1) (E 4 



£F\F 



■max(0, Q(x) - Q(x)), 



where j = argmax^ |VQ(x)j|. 

Proof: For all i e { 1, . . . , d} and r\ > 0, we define 

Qi(rj) = Q(x) + vsgn(5Ci) VQ(x) l + if P+ (l). 

It follows from the definition of p+(l) that 

minQ(x + ae,-) < Q(x + jysgn(x,-)e,-) < Qdrj). 

a 

Since the choice of j = arg max; |VQ(x),| achieves the min- 
imum of min; min,, Qi(rf), the lemma is a direct consequence 
of the following stronger statement: 



minQj(?7) 



(6) 



<Q(x) - 



max(0, Q(x) - Q(x) +/j_(s)||x — x| 



4p+(i)(E ie ni,|s 

with an appropriate choice of 77; this is because 

max (0, Q(x) - Q(x) + p-(s)\\x - x|| 2 ) 2 
>4p_(s) max(0, Q(x) - Q(x))||x - x|| 2 . 
Therefore, we now turn to prove that <JSJ holds. Denoting u = 
J2ieP\F 1**1' we obtain that 

umm.Qi(r)) < S^ \*i\Qi(r)) ( 7 ) 

ieF\F 

< uQ(x)+7y J2 x l Vg(x) 4 +u / 9+(l)?7 2 . 

ief\f 



Since we assume that x is optimal over F, we get that 
VQ(x)j = for all i e F. Additionally, x t = for i & F 
and Xj = for i f- F, Therefore, 

y^ x i VQ(x) l = ^2 (xi-x,)VQ(x), 

i£F\F ieF\F 

= J^ (Xj -x t )VQ(4 
JGFuF 

= VQ(x) T (x-x) . 

Combining the above with the definition of p-(s), we obtain 
that 



inequality, along with Q(x' fc+1 ') < Q(x( fe '), implies that 
either \F t \F^\ = or 



£ x l VQ(x) J <Q(x)-Q(x)-p_( S )||x-x||2 . 

i£F\F 

Combining the above with (0 we get 

m min Qi (77) 
<uQ(x) + 77 [Q(x) - Q(x) - p_(s)||x - x||l] + up + (l)77 2 . 
Setting 

7; = max[0, Q(x) - Q(x) + p_(s)||x - x||i]/(2up+(l)) 

and rearranging the terms, we conclude our proof of ©. ■ 
The direct consequence of the previous lemma is the fol- 
lowing result, which is critical in our analysis. The idea of 
using a nesting approximating sequence has appeared in Q, 
but the current version is improved. The change is necessary 
for the purpose of this paper. In the following p can be chosen 
as any positive number if L = 1, 

Lemma A.4: Consider the OMP algorithm. Consider a pos- 
itive integer L and subsets F Q C F x C F 2 • ■ ■ F L C F U F (0) , 
where Fq = F f~l i^ -*. Assume that niin x . supp( - x ) C ^. Q(x) < 
Q(x) + 9j (j = 0, . . . , L), q Q > q 1 > ■ ■ ■ > q L > 6, and let 
» > ™Pj=i,...,L-i(Qj-i/Qj)- If s > \ F(k) U F| and 



fc = £|l^\F(°>|(p + (l)/p_( S ))ln(2p) 



then 



q( x «) < q(x) 



<7l +A* 1 Q'i-i 



Proof: Note that for any supp(x) C F and supp(x) c i*\ 
we have when F\F 7^ 0: 



p_(s)||x-x|| : 



> 



p-W 



p+(i) (E 



i£F\F 



P+0)\F\F\ 



Therefore Lemma IA.3I implies that at any k such that s > 

\F^ U F\ and ^ = 0, . . . , L, we have either \F e \ F^ \ = 
or 



< 



max(0,Q(x (fe+1) )-Q(x 
P-(«) 



««J 



1 
< exp 



p + (l)|^\F(k)| 



% 



max 



p+(l)|fi\FW 

Therefore for any k' < k and t = 1 

|j^\F<»| =0 or 

Q(x<*))-Q(x)-« < 

p_( s )(fc-fe') 



(o,Q(x«)-Q(x 

(0,Q(xW)-Q(x)-« 
, i, we have either 



(8) 



exp 



(0,Q(x( fc '))-Q(x)-^ 



p+(l)|F,\^') 

We are now ready to prove the lemma by induction on L. If 
L = 1, we can set /V = in I© and consider any n > 0. Since 
Q(x(°)) < min x:supp(x)ci?0 Q(x) < Q(x) + g , we have 

Q(x(°>) - Q(x) - gi < gb. 
Therefore when 

|F 1 \F(°)|(p + (l)/p_( S ))ln(2p)" , 

we have from ® that if \Fi \ F^\ ^ 0, then 

Q(x«) - Q(x) - 9l 

p_(s)fc 



fc = 



< exp 



go 



p+(l)|A\F(o)| 
<(2p)~V 

Note that this inequality also holds when |Fi \F^ C ^| = 0, and 
in such case dHJ does not apply. This is because in this case 
Q(xW) < min x:supp(x)ci?1 Q(x) < Q(x)+q x . Therefore the 
lemma always holds when L = 1. 

Now assume that the lemma holds at L = m — 1 for some 
?7i > 1. That is, with 

m — 1 

fc '=E [l^\^ (0) l(P + (l)/P-(^))ln(2p)" , 
3=1 
we have 

Q(x (fc,) ) < Q(x) + g m _x + p" 1 g m _2- 

This implies that when L = m: 

g(x< fe ')) - Q(x) - 9L < g L _i + p _1 9L-2 - 9L < 2« L _i. 

We thus obtain from ® that if |F L \ F( fc )| ^ 0, then 

Q(xW)-Q(x)- 9L 

p_( s )(fc-fc') 



< 



exp 



p+{l)\F L \F«»\ 



(2?£-i) 



<(2p)- 1 (2 9 L-i). 



g(x (fc+1) ) <Q(x( fc ))~ 



p+(i)If\fw| 



where we simply replace the target vector x in Lemma IA.3I 
by the optimal solution over Ft, and replace x by x^ fe ^. The 



Again this inequality also holds when \Fl \ F^\ = 0, and 

in such case dHJ does not apply. This is because in this case 

(0,Q(xW) - Q(x) - qtj , Q( x ( fc )) < min x:supp(x)ci?L Q(x) < Q(x) +q L . This finishes 

the induction. ■ 

The following lemma is a slightly stronger version of the 
theorem, which we can prove more easily by induction. 



Lemma A.5: Consider the OMP algorithm. If there exist k 
and s such that \F U F (fc ) \ < s and 



k = 



4|F\F(°)|^ln 20/5+(l ^^ (0)|) 



P-(s) 



P-(s) 



then 



Q(xW)<Q(x) + 2.5c.(x) 2 /p_(s). 



(9) 



Proof: We prove this result by induction on \F\ F^\. If 
\F \ F^l = 0, then the bound in (0 holds trivially because 
Q(x«)<Q(x(°))<Q(x). 

Assume that the claim holds with \F \ F^°'\ < m — 1 for 
some m > 0. Now we consider the case of \F \ F^\ = m. 
Without loss of generality, we assume for notational conve- 
nience that F \ F(°) = {1, . . . , m}, and |%| in F \ F^ is 
arranged in descending order so that |xi| > |x2 1 > ••■ > 
|x m |. Let L be the smallest positive integer such that for all 
1 < £ < L, we have 

rn 771 

53 *? < i* 12 x ? ' 



where (|T0T > is used to derive the second inequality. 
Now, if 

m 

2n- 1 p + (m) J2 *5<Q. + H- 1 )e.(x) !i /p-(8), (13) 

i=2 L - 1 

then ( TTZl i implies that ((9} holds automatically (since p, > 10), 
which finishes the induction. Therefore in the following, we 
only consider the case ( fT3] l does not hold, which implies that 

m 

2 M -V+M J2 xf >(l+ M - 1 )e s (x) 2 /p-(s)- 

i=2 i - 1 

Now Lemma lA.21 implies that 

p_( S )||x«-x|| 2 

<2(Q(xW)-g(x)) + £s (x) 2 /p_( S ) 

m 

<6 M -V+M ^ x 4 2 + (2 + Ai - 1 ) es (x) 2 /p-(s) 

i=2 i - 1 

m 777 

<10M _1 p+(n») 51 *i=P-(«) 12 **■ 



7 = 2* 



7 = 2 f 



i=2 1 - 1 



i=2 1 -- 1 



but 



This implies that 



X] x 2 > M ^x 2 , (10) 

i=2 z -- 1 7=2- L 

where p = 10p+(m)/p_(s). We have L < [log 2 ■rn-j + 1 
because the second inequality is automatically satisfied when 
L = [log 2 mj + 1 (the right hand side is zero in this case). 
Moreover, if the second inequality is always satisfied for all 
L > 1, then we can simply take L = 1 (and ignore the first 
inequality). 

We can now define 

Ft = (Fn F (0) ) U {i : 1 < i < min(m, 2 e - 1)} 

for £ = 0,1, 2,..., L. 

Lemma |A. 1 1 implies that for £ = 0, 1, . . . , L: 

min Q(x) < Q(x) + qe, 

xC-Ff 

777 

qe = 1.5p+(m) 53 x? + 0.5e s (x) 2 /p + (m). 

i=2* 

Moreover q^_i < pqi when £ = 1, . . . , L — 1. We can thus 
apply Lemma lA.41 to conclude that when 

L 

fe=53[(2''-l)(p + (l)/p_( S ))ln(2 M )l 

<2 L+1 ( P+ (l)/p_(,s))ln(2 M )-l ! (11) 

we have 

Q(x«)-Q(x) 

m m 

<1.5 /3+ (m) J] x 2 + 1.5^V+M 5Z *i 

i=2 L i=2 L - 1 

+ 0.5(1 + M _1 )e s (x) 2 /P+M 

m n ■=> 

<3 M -V+M 53 x 2 + ^— (l+M-^e.Cx) 2 , (12) 



53 x 2 < 53 x 2 <i| X «- 

i=m-\F\F<. k >\ + l ieF\_F< fc ) 



-x|| 2 < 



i=2 1 - 1 



Therefore m - |F\ F< fe )| + 1 > 2 L " 1 . That is, |F \ F^\ < 
m — 2 L ~ 1 . It follows from the induction hypothesis that after 
another 

\A{m-2 L ^){p + {l)/p_{s)) ln(2/*)l 

OMP iterations, (O holds. Therefore by combining this es- 
timate with ( fTTT i. we know that the total number of OMP 
iterations for (0 to hold (starting with i^ ') is no more than 

\4(m-2 L - 1 )( P+ (l)/p_(s)) ln(2/i)l 
+ 2 L+1 (p + (l)/p_(.s)) In(2 M )-l 
<r4m(p+(l)/p_( S ))ln(2/x)l. 

This finishes the induction step for the case |F \ F^\ — m. 
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